13 research outputs found

    Forward Stochastic Reachability Analysis for Uncontrolled Linear Systems using Fourier Transforms

    Full text link
    We propose a scalable method for forward stochastic reachability analysis for uncontrolled linear systems with affine disturbance. Our method uses Fourier transforms to efficiently compute the forward stochastic reach probability measure (density) and the forward stochastic reach set. This method is applicable to systems with bounded or unbounded disturbance sets. We also examine the convexity properties of the forward stochastic reach set and its probability density. Motivated by the problem of a robot attempting to capture a stochastically moving, non-adversarial target, we demonstrate our method on two simple examples. Where traditional approaches provide approximations, our method provides exact analytical expressions for the densities and probability of capture.Comment: V3: HSCC 2017 (camera-ready copy), DOI updated, minor changes | V2: Review comments included | V1: 10 pages, 12 figure

    Balancing Sample Efficiency and Suboptimality in Inverse Reinforcement Learning

    Get PDF
    We propose a novel formulation for the Inverse Reinforcement Learning (IRL) problem, which jointly accounts for the compatibility with the expert behavior of the identified reward and its effectiveness for the subsequent forward learning phase. Albeit quite natural, especially when the final goal is apprenticeship learning (learning policies from an expert), this aspect has been completely overlooked by IRL approaches so far. We propose a new model-free IRL method that is remarkably able to autonomously find a trade-off between the error induced on the learned policy when potentially choosing a sub-optimal reward, and the estimation error caused by using finite samples in the forward learning phase, which can be controlled by explicitly optimizing also the discount factor of the related learning problem. The approach is based on a min-max formulation for the robust selection of the reward parameters and the discount factor so that the distance between the expert’s policy and the learned policy is minimized in the successive forward learning task when a finite and possibly small number of samples is available. Differently from the majority of other IRL techniques, our approach does not involve any planning or forward Reinforcement Learning problems to be solved. After presenting the formulation, we provide a numerical scheme for the optimization, and we show its effectiveness on an illustrative numerical case

    Following Newton direction in Policy Gradient with parameter exploration

    Get PDF
    This paper investigates the use of second-order methods to solve Markov Decision Processes (MDPs). Despite the popularity of second-order methods in optimization literature, so far little attention has been paid to the extension of such techniques to face sequential decision problems. Here we provide a model-free Reinforcement Learning method that estimates the Newton direction by sampling directly in the parameter space. In order to compute the Newton direction we provide the formulation of the Hessian of the expected return, a technique for variance reduction in the sample-based estimation and a finite sample analysis in the case of Normal distribution. Beside discussing the theoretical properties, we empirically evaluate the method on an instructional linear-quadratic regulator and on a complex dynamical quadrotor system

    A classification-based approach to the optimal control of affine switched systems

    Get PDF
    This paper deals with the optimal control of discrete–time switched systems, characterized by a finite set of operating modes, each one associated with given affine dynamics. The objective is the design of the switching law so as to minimize an infinite–horizon expected cost, that penalizes frequent switchings. The optimal switching law is computed off–line, which allows an efficient online operation of the control via a state feedback policy. The latter associates a mode to each state and, as such, can be viewed as a classifier. In order to train such classifier–type controller one needs first to generate a set of training data in the form of optimal state–mode pairs. In the considered setting, this involves solving a Mixed Integer Quadratic Programming (MIQP) problem for each pair. A key feature of the proposed approach is the use of a classification method that provides guarantees on the generalization properties of the classifier. The approach is tested on a multi–room heating control problem

    A majority voting classifier with probabilistic guarantees

    Get PDF
    This paper deals with supervised learning for classification. A new general purpose classifier is proposed that builds upon the Guaranteed Error Machine (GEM). Standard GEM can be tuned to guarantee a desired (small) misclassification probability and this is achieved by letting the classifier return an unknown label. In the proposed classifier, the size of the unknown classification region is reduced by introducing a majority voting mechanism over multiple GEMs. At the same time, the possibility of tuning the misclassification probability is retained. The effectiveness of the proposed majority voting classifier is shown on both synthetic and real benchmark data-sets, and the results are compared with other well-established classification algorithms

    A data-based approach to power capacity optimization

    Get PDF

    Optimal control to reduce emissions in gasoline engines: An iterative learning control approach for ECU calibration maps improvement

    Get PDF
    Control of emissions in gasoline engines has become more stringent in the last decades, especially in Europe, posing new and important problems in the control of complex nonlinear systems. In this work a preliminary investigation is conducted on the idea of exploiting Iterative Learning Control to optimize calibration maps that are commonly used in the Engine Control Unit of gasoline engines. In this spirit, starting from existing maps, we show how to refine them using a gradient-descent iterative learning control algorithm, considering additional constraints in the optimization problem. The outcome of this procedure is a control signal which can be integrated in a modified map. The performance of the proposed technique is validated on the provided training signal and cross-validated on different reference signals. Simulation results show the effectiveness of the approach

    Analysis of Different Strategies for Lowering the Operation Temperature in Existing District Heating Networks

    Get PDF
    District heating systems have an important role in increasing the efficiency of the heating and cooling sector, especially when coupled to combined heat and power plants. However, in the transition towards decarbonization, current systems show some challenges for the integration of Renewable Energy Sources and Waste Heat. In particular, a crucial aspect is represented by the operating temperatures of the network. This paper analyzes two different approaches for the decrease of operation temperatures of existing networks, which are often supplying old buildings with a low degree of insulation. A simulation model was applied to some case studies to evaluate how a low-temperature operation of an existing district heating system performs compared to the standard operation, by considering two different approaches: (1) a different control strategy involving nighttime operation to avoid the morning peak demand; and (2) the partial insulation of the buildings to decrease operation temperatures without the need of modifying the heating system of the users. Different temperatures were considered to evaluate a threshold based on the characteristics of the buildings supplied by the network. The results highlight an interesting potential for optimization of existing systems by tuning the control strategies and performing some energy efficiency operation. The network temperature can be decreased with a continuous operation of the system, or with energy efficiency intervention in buildings, and distributed heat pumps used as integration could provide significant advantages. Each solution has its own limitations and critical parameters, which are discussed in detail
    corecore